Using a Data Metric for Preprocessing Advice
نویسنده
چکیده
This paper describes research that is performed in the course of a project where a methodology for providing user support for KDD processes plays a central role. Although methodologically we aim at supporting the whole process of applying inductive learning techniques, the current paper fo-cussus on a part of this process. The main issue in this paper is the support of data preprocessing for KDD. We give some insights in the metadata we calculate from a dataset as part of the method for user support. DCT (Data Char-acteristion Tool) is implemented in a software environment (Clementine). Some examples are given that resulted from running the UGM/DCT (User Guidance Module combined with DCT) on the data.
منابع مشابه
Using a Data Metric for Preprocessing Advice for Data Mining Applications
This paper describes research that is performed in the course of a project where a methodology for providing user support for KDD processes plays a central role. Although methodologically we aim at supporting the whole process of applying inductive learning techniques, the current paper focussus on a part of this process. The main issue in this paper is the support of data preprocessing for KDD...
متن کاملEnhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining
This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...
متن کاملImproving the Performance of ICA Algorithm for fMRI Simulated Data Analysis Using Temporal and Spatial Filters in the Preprocessing Phase
Introduction: The accuracy of analyzing Functional MRI (fMRI) data is usually decreases in the presence of noise and artifact sources. A common solution in for analyzing fMRI data having high noise is to use suitable preprocessing methods with the aim of data denoising. Some effects of preprocessing methods on the parametric methods such as general linear model (GLM) have previously been evalua...
متن کاملNon-zero probability of nearest neighbor searching
Nearest Neighbor (NN) searching is a challenging problem in data management and has been widely studied in data mining, pattern recognition and computational geometry. The goal of NN searching is efficiently reporting the nearest data to a given object as a query. In most of the studies both the data and query are assumed to be precise, however, due to the real applications of NN searching, suc...
متن کاملComposite Kernel Optimization in Semi-Supervised Metric
Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998